disclaimer: To ensure that the notebook can be run from (more or less) any point, I try to load the relevant functions or modules whenever I use them in a cell. This is generally not good practice as it adds unneccesary overhead

0. Image representation as numerical arrays

We start by importing numpy and creating and printing a simple 9x9 checkerboard array

Then, we import pyplot and image modules from the ploting library matplotlib. Using it, we can display our checkerboard array in image form:

As another simple example, we will import the data module image processing library scikit-image and load a small image of a bush.

First, we want to print the pixel values:

Can you see the bush?

Next, show the image:

1. Pixel-level operations

Now that we have a sense of what a digital image is, let's start manipulating it. We'll begin with simple pixel-level operations

1.1 Basic pixel-level operations

Let's look at a more interesting image. From scikit-image data we'll open a example IHC image, and plot it using pyplot.

Seems like we have an RGB image. Let's look at every channel independently.

We can invert the image using the invert function from scikit-images utilities module:

Let's try some other pixel-level operations. We'll use the Exposure module from scikit image.

  1. A gamma correction applies the nonlinear transform $V_{out} = V_{in}^\gamma$.

  2. A log transform applies $V_{out} = log(V_{in}+1)$.

  3. A sigmoid transform applies $V_{out} = \frac{1}{1+e^{gain\cdot(\text{cutoff}-V_{in})}}$.

  4. Equalization transforms the intensity histogram of an image to a uniform distribution. It often enhances the contrast of the image

  5. CLAHE works similarly to equalization thats applied separately to different regions of the image.

Try to apply these by calling the relevant function from skimage.exposure, or by direct calculation.

Play with the different parameters and see how they change the output.

1.2 Image filtering

Spatial filtering is an image processing technique for changing the intensities of a pixel according to the intensities of some neighborhood of pixels.

The Kernel of the filter defines the neighborhood and the weights asigned to each pixel in the neighborhood:

This procedure is formally a convolution and is marked by an asterisk: $I_o = I_i\ast f$.

side note: since a convolution in the spatial domain is equivalent to multiplication in the frequency domain. Sometimes it is more computationally reasonable to calculate these in fourier space.

side side note: filtering can also be performed in the frequency domain by directly removing a set of frequencies from an image.

The kernel can be of any shape/size, it is applied to each pixel in the image, and the output is a new, filtered, image. The output image is often called the response to the given filter.

Example, local average:

Filtering is an incredibly versatile tool with which you can emphasize certain features or remove other features.

Image processing operations implemented with filtering include smoothing, sharpening, and edge enhancement.

To implement different image filters, we will use the filters module from scikit-image

1.2.1 Smoothing

Smoothing, aka low-pass filtering, is used for removing high-frequency noise from images. Most commonly, a gaussian kernel is used, but others (e.g. local mean/median) work too. We'll see the effect of gaussian filtering.

Try to change the value of sigma (width of the gaussian) to see how the output changes.

1.2.2 Sharpening

sharpening is sometimes used to enhance a blurry (i.e. crappy) image.

  1. Start with input image
  2. Apply gaussian filter with very narrow kernel
  3. Subtract filtered image from input image to get only high frequency components
  4. Amplify (alpha) and add high frequency components to original input image

1.2.3 Edge enhancement

Edge detecting filters work by measuring the local spatial gradient of an image. Common types are the Sobel, Prewitt and Roberts.

The filters are usually applied to each direction individually and then the total magnitude of the gradient is calculated.

$|\nabla| = \sqrt{\nabla_x^2+\nabla_y^2}$

Sobel:





Prewitt:





Roberts:

Direct application of edge detectors often results in somewhat noisy responses. A way to overcome this is by first smoothing the image with a gaussian filter and then applying the edge filter. The width of the gaussian kernel will determine the size of edges the filter detects.

DoG is the also the beginning of the canny edge detection algorithm:

1) Apply DoG filter 2) Apply Non-maximum suppression 3) Apply hysteresis thresholding

1.2.4 Gaussian derivatives

Now let's remember that the application of filters is an associative operation (because convolution is linear!). It's equivalent to apply the gaussian and then the gradient filter to the image and to apply the gradient filter to the gaussian, then apply the result to the image, i.e. $\nabla*(G*I) = (\nabla*G)*I$ where $\nabla$ is the gradient (derivative) filter, $G$ is the gaussian filter, and $I$ is the image.

We can generalize this idea and apply the derivative multiple times, to get a family of interesting filters:

$\nabla^n*G$:



Let's use a second order derivative (aka a Laplacian) and make a ridge detector:

We'll look at a retinal photo where vasculature is an interesting feature

We'll invert the images so that the vasculature appears as bright lines on a dark background

We find ridges in the inverted image by applying a 2nd order gaussian derivative filter. The second order derivative is called a Laplacian:

This commonly used ridge detector is called a Laplacian of Gaussian (LoG)!

(We'll get back to this image in a bit)

1.3 Masking

A mask is a binary image (0s and 1s) that typically separates a given input image into Foreground (interesting) and Background (boring) regions, or for picking a region-of-interest (ROI). A mask is applied to an image by element-wise multiplication. The size of a mask must be identical to the size of the image it's applied to.

Let's begin by creating a simple circular mask. We'll create an array where the value at each point is it's distance from the center of the image, and display it as an image:

Of these points, we'll pick a circle of radius 100 and display it as an image:

This object is a mask. If you multiply this matrix of 0s and 1s with an image of the same size, only the parts that are ==1 will remain

Let's apply this mask to our histology image.

What happens if we invert the mask?

Just for closure, let's see what happens when we look at the full RGB image and try to apply the mask

Whoops. Seems like something is wrong. Our problem is that numpy didn't know how to multiply a 512x512x3 with a 512x512 mask. Numpy makes solving this very easy by adding a singleton dimension (look up broadcasting in your spare time).

1.4 Thresholding

1.4.1 Simple thresholding

Thresholding an image is the process of setting an intensity (or intensities) for separating the different components of an image.

In simplest case, the foreground and background have very different intensities. In that case thresholding is just clustering pixels by their intensity levels.

To find the right threshold, let's examine a histogram of pixel intensity values

Pick an appropriate threshold, by eye, and see if you can remove the background. What happens when you increase or decrease the threshold?

Our mask looks ok, but it has a lot of salt & pepper speckle noise. Why is that?

We can try and use what we learned before about filtering to clean up our results. What filter should we use?

It's usually a good idea before creating a mask to despeckle an image using a narrow gaussian filter!

1.4.2 Morphological operations

Morphology is a broad set of image processing operations that process images based on shapes. In a morphological operation, each pixel in the image is adjusted based on the value of other pixels in its neighborhood. By choosing the size and shape of the neighborhood, you can construct a morphological operation that is sensitive to specific shapes in the input image. (explanation from Mathworks)

Morphological operations are based around a structuring element, which is a small binary image, often of a disk or a square. The structuring element is positioned at all possible locations in the image and it is compared with the corresponding neighbourhood of pixels. Some operations test whether the element "fits" within the neighbourhood, while others test whether it "hits" or intersects the neighbourhood.

Common operations for image processing

Erosion - output image =1 wherever the structuring element fits (erodes the mask)

Dilation - output image =1 wherever the structuring element hits (expands the mask)

Opening - Erosion followed by dilation (opens gaps in spots where the mask is weakly connected)

Closing - Dilation followed by erosion (closes holes in the mask)

A very thorough explanation of morphological operationscould be found here

1.4.3 Masking actual data

We'll repeat the thresholding procedure using an actual microscopy image of fluorescent nuclei

Again, let's plot a histogram of intensity values

And again we'll pick a value by eye:

Not bad! But also not very scalable. If we have 100s of images we can't look at them one-by-one and find thresholds by eye. Next, we'll look at some methods for automatically finding the thresholds.

1.5 Automated threshold calculation

There is a very large list of algorithms for threshold calculation out there that are optimized for different situations. We will briefly review a few of the most common ones.

1.5.1 Iterative mean thresholding

Algorithm:

  1. Start with some threshold $T_i$
  2. Compute the means $m_0$ and $m_1$ of the BG and FG
  3. Update $T_{i+1} = \frac{m_0+m_1}{2}$
  4. Repeat until it converges

1.5.2 Otsu thresholding

The algorithm exhaustively searches for the threshold that minimizes the intra-class variance, defined for a given threshold $T$ as a weighted sum of variances of the two classes: $\sigma^2_w(T)=\omega_0(T)\sigma^2_0(T)+\omega_1(T)\sigma^2_1(T)$

For 2 classes, minimizing the intra-class variance is equivalent to maximizing inter-class variance, which is much easier to calculate: \begin{align} \sigma^2_b(T) & =\sigma^2-\sigma^2_w(T)=\omega_0(\mu_0-\mu_T)^2+\omega_1(\mu_1-\mu_T)^2 \\ & =\omega_0(T) \omega_1(T) \left[\mu_0(T)-\mu_1(T)\right]^2 \end{align}

1.5.3 Triangle thresholding

Algorithm:

  1. Draw a straight line between the histogram peak and the brightest value.
  2. From every point on that line, draw the shortest connecting line to the histogram.
  3. Find longest of these connecting lines.
  4. Threshold is set at the intersection of that line and the curve.

note: Triangle thresholding is good for situations where the image is mostly background, and there is no clear "peak" of bright pixels.

scikit-image's filters module implements a large variety of thresholding algorithms. Let's apply the ones we just learned about.

Let's look at the resulting masks we get with each of these thresholds

Let's briefly look back at the retinal images and try to mask only the vasculature:

load the image and apply LoG filter:

Now, let's do the same procedure of automatically finding thresholds:

Play around with the width of the gaussian. Which thresholding algorithm works best in this case?

1.5.4 Local thresholding

All of the methods we saw so far are global in the sense that the same threshold is applied to the whole picture. Sometimes we can have an image with vastly different intensity distributions at different locations. Using local thresholding, we can overcome such cases.

Let's compare the results from a global (Otsu) and a local threshold.

2. Image segmentation

Image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

2.1 Connected components

After we generate a mask, the simplest segmentation involves simply looking for regions in the mask that are connected and labeling each one as a separate object.

We begin by generating a simple mask using the triangle threshold method:

scikit-image's measure module implements a variety of useful methods for segmentation. the label function returns a labeled image of connected components (CCs). Each CC is uniquely numbered by an integer.

$\begin{bmatrix} 1 & 1 & 0 & 0 & 2\\ 1 & 1 & 0 & 2 & 2\\ 0 & 0 & 0 & 0 & 0\\ 0 & 3 & 0 & 4 & 4\\ 0 & 0 & 0 & 4 & 4\\ \end{bmatrix}$

We can easily generate a mask for a specific CC using the binary operation labels==i

Problem with simple CC segmentation : overlapping objects

We often really care about having only a single object per label. Using CC, any overlapping objects will merge into one blob:

These problems can be partially resolved using morphological operations, but there's no silver bullet

2.2 Watershed based segmentation

2.2.1 The watershed algorithm

The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

More precisely, the algorithm goes as follows:

  1. Label local minima (i.e. $S_1$, $S_2$)
  2. Move to next higher intensity level
  3. Assign to each point the label of it's closest label set. Points equidistant to multiple sets are labeled as boundaries and intensities set to 0
  4. Repeat until all points are labeled

Let's start with a very naive application. We will invert the image, and then simply apply the watershed function from the scikit-image morphology module. The function returns a labeled image.

So this clearly didn't work. Why? How do we fix it?

Noise generates a ton of local minima. Each gets its own basin. This leads to massive oversegmentation.

Watershed segmentation is only a part of a segmentation pipeline. Preprocessing (denoising, smoothing, seeding minima) of the image is CRUCIAL for it to work well.

The first thing we'll do is to apply the mask that we found before. This is simply done by adding a mask argument to the watershed function.

So we got rid of all the BG regions, but we are still oversegmenting. Why?

Let's try to smoothen the image and get rid of the many local minima. How wide should the gaussian kernel be?

We're starting to get somewhere!! Can we do better?

We can do more to help the algorithm by providing local markers (seeds) from which to start the process

We will find seeds by calculating local maxima over areas that are larger than 30x30 pixels using the footprint argument for the function peak_local_max

This is pretty good! We're still getting a few errors here and there, but there's no big systematic over- or under- segmentation. This is a typical good result when dealing with real data.

3. Feature extraction

Feature extraction is a process of dimensionality reduction by which an initial raw image is reduced to a list of objects and attributes

3.1 Extracting region properties

scikit-image's measure module implements a method called regionprops that accepts a labeled mask of connected components, and, optionally, a corresponding image, and returns a list. Each object on the list contains useful data about the size, shape, position, and intensity (see the full list here) of a specific component.

The length of the list is equal to the total number of objects detected.

We'll start by extracting the number of CC we found and the area of each CC

We can look at indivudual object found

Let's use a scatter plot to compare our results to the image

Or even nicer scatter plots!

You can even draw your points directly on the image!

Let's define some useful functions for converting a list of props into a pandas dataframe. These should become obsolete soon since the new version of scikit-image will have this functionality

Now, to get all the properties in table/dataframe form we simply run:

Finally, if we imaged our cells in multiple channels, we would want use the same segmented nuclei and measure intensities of other channels.

Add these new features to the pandas dataframe

Sometimes it's easier to see a bimodal distribution in log scale

We can compare distributions of different channels

And so, we've successfully implemented a simple image segmentation pipeline for multicolor microscopy data.

4. Intro to tracking

Sometimes we image the same object at different times. It's useful to have a way to track the object and see how it changes dynamically.

It is usually more efficient to track objects that have been segmented by using the extracted features, rather than the raw image data. As such, this is technically not an image processing, but rather a data analysis method. However, it's a very common task in microscopy experiments.

While we won't have time to build a full tracking pipeline, we will try to understand the basics of object tracking. The core building block of a tracking pipeline is a method that matches detected. objects from 2 different timepoints.

Let's start by creating a small toy dataset. The dataset will have 2 timepoints (before and after). We will generate random positions for the "before" points. We will then generate random velocities and calculate the "after" points. Our goal is to match "before" points to the correct "after" points:

4.1 Nearest neighbor tracking

This is the simplest thing you could do. For every point, find the nearest neighbor and match them:

Notice that matching 0->1 is not always the same as matching 1->0, so make sure to operate on the right axes:

You could use an efficient method like KdTree to calculate the nearest neighbors. In our case we have a small dataset, so we can just calculate directly. We'll use the distance_matrix method from scipy.spatial and the function argmin from numpy to find the matrix index where the distance is minimal. Make sure to specify the axis for argmin so it applies line by line.

We're great at this! We can go home now! Let's try some more diffusion just for fun:

ok, maybe not perfect. Can we do better? What if instead of trying to match every individual object, we instead try and find a global matching that minimizes the TOTAL distance of all connections?

4.2 Global optimization using linear assignment.

Let's look at the following arrangement:

If we just match nearest neighbors we end up with this:

That's not right. Instead, what if we consider every possible configuration of connections between the points and find the one that minimized the total distance (which we will call the "cost" of a configuration)

Or in the words of Wikipedia

Which in our case will give us:

There are a few good algorithms that solve the linear assignment problem efficiently. Most famous ones are called the Hungarian Algorithm and the Jonker-Volgenant (JV) algorithm. We will use the JV algorithm which is implemented in scipy.optimize.linear_sum_assignment:

Play around with different drift/diffusion parameters and see how they affect performance:

4.3 Using more object features for tracking

So far we've only looked at the position of an object. In most cases, many other features such as object size, shape, intensity, etc. can be used to improve our tracking results. How can we incorporate other information into our tracking?

Let's build a tracking dataset but this time include also an intensity feature. For each object we'll sample an intensity from a lon-normal distribution at the first time point. We'll assume that this intensity comes from a channel that is more or less stable between the timepoints and only add a noise term to get the intensities at the second timepoint:

The choice of how to incorporate this information into the cost function is up to the user/developer. Different common software do it differently and there is no one right answer.

In principal, any feature can be added to your cost function. It is also not sensitive to dimensionality so tracking cells in 3D works the same and is equally efficient.

Done!